Rough Sets in Data Warehousing
نویسندگان
چکیده
The theory of rough sets [15,16], based on the universal framework of information systems, provides a powerful model for representing patterns and dependencies both in databases and in data mining. On the one hand, although there are numerous rough set applications to data mining and knowledge discovery [10,18], the usage of rough sets inside the database engines is still quite an uncharted territory. On the other hand, however, this situation is not so exceptional given that even the most well-known paradigms of machine learning, soft computing, artificial intelligence, and approximate reasoning are still waiting for more recognition in the database research, with huge potential in such areas as, e.g., physical data model tuning or adaptive query optimization [2,3]. Rough set-based algorithms and similar techniques can be applied to improve database performance by employing the automatically discovered dependencies to better deal with query conditions [5,9]. Another idea is to use available information to calculate rough approximations of data needed to resolve queries and to assist the database engine in accessing relevant data [20,24]. In our approach, we partition data onto rough rows, each consisting of 64K of original rows. We automatically label rough rows with compact information about their values on particular columns, often involving multi-table cross-relationships. One may say that we create a new information system where objects take the form of rough rows and attributes correspond to various flavors of rough row information. A number of database operations can be fully or partially processed within such a new system, with an access to the original data pieces still available, whenever required on top of rough row information. Such a framework seems to actually fit the paradigms of rough and granular computing [1,17], where calculations on granules are additionally allowed to interact with those on single items. The above ideas guided us towards implementing the fully functional database product, with interfaces provided via integration with MySQL [13,14] and with internals based on such trends in database research as columnar stores [8,11] and adaptive compression [6,22]. Relying on relatively small, flexible rough row information enabled us to become especially competitive in the field of analytical data warehousing, where users want to analyze terabytes of data in a complex, dynamically changing fashion. We realize though that we should keep comparing ourselves against other strategies of using data about data [4,12] and redesigning various dependency/pattern/metadata/index structures originally defined over single rows to let them work at our rough row level [7,19]. In particular, searching
منابع مشابه
Rough sets theory in site selection decision making for water reservoirs
Rough Sets theory is a mathematical approach for analysis of a vague description of objects presented by a well-known mathematician, Pawlak (1982, 1991). This paper explores the use of Rough Sets theory in site location investigation of buried concrete water reservoirs. Making an appropriate decision in site location can always avoid unnecessary expensive costs which is very important in constr...
متن کاملA New Approach for Knowledge Based Systems Reduction using Rough Sets Theory (RESEARCH NOTE)
Problem of knowledge analysis for decision support system is the most difficult task of information systems. This paper presents a new approach based on notions of mathematical theory of Rough Sets to solve this problem. Using these concepts a systematic approach has been developed to reduce the size of decision database and extract reduced rules set from vague and uncertain data. The method ha...
متن کاملOn $L$-double fuzzy rough sets
ur aim of this paper is to introduce the concept of $L$-double fuzzy rough sets in whichboth constructive and axiomatic approaches are used. In constructive approach, a pairof $L$-double fuzzy lower (resp. upper) approximation operators is defined and the basic properties of them are studied.From the viewpoint of the axiomatic approach, a set of axioms is constructed to characterize the $L...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملL-valued Fuzzy Rough Sets
In this paper, we take a GL-quantale as the truth value table to study a new rough set model—L-valued fuzzy rough sets. The three key components of this model are: an L-fuzzy set A as the universal set, an L-valued relation of A and an L-fuzzy set of A (a fuzzy subset of fuzzy sets). Then L-valued fuzzy rough sets are completely characterized via both constructive and axiomatic approaches.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008